Step 1: Prepare data
- Make four fake variables (
name, math,
stat, gender) and merge them into data frame,
named df1
# Make 4 variables
name <- c("Joe", "Ze'ev", "David", "Mike", "Ross", "Woojin", "Inha", "Jih-wen",
"Mark", "Dennis", "Carol", "Shira", "Mimi", "Amital", "Rachel", "Ariel",
"Kelly", "RongRong", "Kathy", "Barbara")
math <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 4, 5, 6, 7, 9, 8, 9, 8, 10)
stat <- c(2, 4, 6, 5, 7, 9, 7, 10, 12, 15, 14, 13, 12, 13, 11, 10, 9, 8, 6, 4)
gender <- c("Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female")
# Merge the 4 variables into data frame, df1
df1 <- tibble(name, math, stat, gender)
- Let’s check how
df1 looks like
df1
# A tibble: 20 × 4
name math stat gender
<chr> <dbl> <dbl> <chr>
1 Joe 1 2 Male
2 Ze'ev 2 4 Male
3 David 3 6 Male
4 Mike 4 5 Male
5 Ross 5 7 Male
6 Woojin 6 9 Male
7 Inha 7 7 Male
8 Jih-wen 8 10 Male
9 Mark 9 12 Male
10 Dennis 10 15 Male
11 Carol 2 14 Female
12 Shira 4 13 Female
13 Mimi 5 12 Female
14 Amital 6 13 Female
15 Rachel 7 11 Female
16 Ariel 9 10 Female
17 Kelly 8 9 Female
18 RongRong 9 8 Female
19 Kathy 8 6 Female
20 Barbara 10 4 Female
Assign data: ggplot()
- You need to do
mapping, meaning assigning data within
aes()
ggplot() has two arguments: data and
mapping
- First, let’s assign
df1 as data to use
- Then, we can see the canvas where we draw a graph
ggplot(data = df1)

- We can see a blank canvas
Step 3: Select a graph
- You choose a geometric object,
geom_point() to draw a
scatter plot
- You can add
geom_point() by using +
ggplot(data = df1) +
aes(x = math, y = stat) +
geom_point()

aes(x = math, y = stat) is an independent layer
- But, you can put
aes(x = math, y = stat) within
geom_point()
- This means that you map
aes() within
geom_point()
- Within
geom_point(), you need to map by tying
mapping =
- Within
aes(), you need to link each element (such as a
dot, line, or surface) to a particular variable
→ Here, we link x axis is linked to math,
y axis is linked to stat, and each dots are
colored by gender
ggplot(data = df1) +
geom_point(mapping = aes(x = math,
y = stat,
color = gender))

- Both
data = and mapping = are the first
argument and they are omissible
→ You can rewrite the R code above as follows:
ggplot(df1) +
geom_point(aes(x = math,
y = stat,
color = gender))
- Further, by using the pipe operator (
%>% or
|>), you can rewrite it as follows:
- You can use the pipe operator (
%>% or
|>) to link command together.
- This tell
R to do something and then something else to
the output of the first something.
- Chaining functions together like this will become very useful as
your tasks become more complicated.
df1 %>%
ggplot() +
geom_point(aes(x = math,
y = stat,
color = gender))
Mapping aes() and omission of R code
Summary on mapping
aes() ・In
Step 2: Assign the variables,
aes(x = math, y = stat) is added by + after
ggplot(data = df1)
ggplot(data = df1) +
aes(x = math, y = stat) +
geom_point()
aes(x = math, y = stat)
can not only be added within ggplot()、but also be added
within geom_point()
- In sum, you have two ways of mapping
aes():
Mapping aes() within ggplot()
df1 %>%
ggplot(aes(x = math,
y = stat,
color = gender) +
geom_point()
Mapping aes() within geom_plot()
df1 %>%
ggplot() +
geom_point(aes(x = math,
y = stat,
color = gender))
- Which way you choose all depends on what you want to do in your
analysis
How pipes (%>% or
|>) are used ・The pipes (%>% or
|>) allow you to express a sequence of multiple
operations
・%>% and |> can be used
interchangeably.
・Pipes can greatly simplify your code and make your operations more
intuitive
・The pipe operator (%>%) is automatically imported as
part of the {tidyverse} library
・Pipes (%>%) are included in {magrittr}
package
・{magrittr} package is included in
{tidyverse} package
→ You need to read either of the following packages to use the pipe
operator (%>%)
library(magrittr)
library(tidyverse)
The pipe operator
(%>%) automatically passes the output from the first
line into the next line as the input
- You can use the pipe operator (
%>%) to link command
together.
- This tell
R to do something and then something else to
the output of the first something.
- Chaining functions together like this will become very useful as
your tasks become more complicated.
Let’s take a look at an exmple of using the pipe
- Generate vectors from 1 to 10
1:10
[1] 1 2 3 4 5 6 7 8 9 10
- Calculate
1 + 2 + .... + 10
sum(1:10)
[1] 55
- If you use pipe (
%>%), you write R code as
follows:
1:10 %>% # Generate vectors from 1 to 10
sum() # Add them all
[1] 55
- If you want to calculate the square root, …
1:10 %>%# Generate vectors from 1 to 10
sum() %>% # Add them all
sqrt() # Calculate the square root
[1] 7.416198
- Generate vectors from 1 to 10 → Add them all → Calculate the square
root
- This is easier to interpret the sequence of operations
If you don’t use pipes,…
sqrt(sum(1:10))
[1] 7.416198
- This is less intuitive because you have to think backward
- Calculate the square roof ← Add them all ← Generate vectors
How to interpret R code with pipes
(%>%) ・You can interpret the R code you made in
4.1 A simple scatter plot as follows:
df1 %>% # Use df1 as data
ggplot(aes(x = math, # Assign x = math
y = stat, # Assign y = stat
color = gender)) + # Dots are colored by gender
geom_point() # Draw a scatter plot
・df1 %>% ggplot() means the first argument of
ggplot() is df1 Interpretation of the
R code:
・Use df1 as data
→ Assign x = math
→ Assign y = stat
→ Dots are colored by gender
→ Draw a scatter plot
・You don’t have to go backward in interpreting the R code
・R code with pipes (%>%) are intuitive and easy to
follow